Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora
نویسندگان
چکیده
One of the main resources used for the task of bilingual lexicon extraction from comparable corpora is : the bilingual dictionary, which is considered as a bridge between two languages. However, no particular attention has been given to this lexicon, except its coverage, and the fact that it can be issued from the general language, the specialised one, or a mix of both. In this paper, we want to highlight the idea that a better consideration of the bilingual dictionary by studying its entries and filtering the non-useful ones, leads to a better lexicon extraction and thus, reach a higher precision. The experiments are conducted on a medical domain corpora. The French-English specialised corpus ’breast cancer’ of 1 million words. We show that the empirical results obtained with our filtering process improve the standard approach traditionally dedicated to this task and are promising for future work.
منابع مشابه
A Combination of Models for Bilingual Lexicon Extraction from Comparable Corpora
In this paper we present a method to extract bilingual terminologies from comparable non-aligned corpora, by using multiple linguistic knowledge sources, such as: non-parallel corpora, bilingual thesauri, a preliminary bilingual dictionary, etc... We focus on two core technologies: bilingual lexicon extraction from comparable corpora and expansion through thesauri categories based on different ...
متن کاملContext Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the unresolved problem of polysemous words revealed by the bilingual dictionary and introduce a use of a Word Sense Disambiguation process that aims at improving the adequacy of context vectors. On two specialized FrenchEnglish comparable corpora, empiric...
متن کاملTowards a Generic Approach for Bilingual Lexicon Extraction from Comparable Corpora
This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the problem associated to polysemous words found in the seed bilingual lexicon when translating source context vectors. To improve the adequacy of context vectors, the use of a WordNetbased Word Sense Disambiguation process is tested. Experimental results...
متن کاملAdapted Seed Lexicon and Combined Bidirectional Similarity Measures for Translation Equivalent Extraction from Comparable Corpora
An improved method for extracting translation equivalents from bilingual comparable corpora according to contextual similarity was developed. This method has two main features. First, a seed bilingual lexiconwhich is used to bridge contexts in different languagesis adapted to the corpora from which translation equivalents are to be extracted. Second, the contextual similarity is evaluated by ...
متن کاملMeasuring Comparability of Multilingual Corpora Extracted from Wikipedia
Comparable corpora can be used for many linguistic tasks such as bilingual lexicon extraction. By improving the quality of comparable corpora, we improve the quality of the extraction. This article describes some strategies to build comparable corpora from Wikipedia and proposes a measure of comparability. Experiments were performed on Portuguese, Spanish, and English Wikipedia.
متن کامل